Automatic Disambiguation of Homographic Heterophone Pairs Containing Open and Closed Mid Vowels
نویسندگان
چکیده
The issue of openness in Brazilian Portuguese vowels is a question not yet satisfactorily explored in the field of automatic classification of homographic heterophones (HH). Therefore, we aimed to develop and test a pilot classifier which assists in the automatic disambiguation of HH. For this purpose, a set of 226 word pairs of HH with the unique grammatical classes, distinguished by alternating mid vowels [e, E] and [o, O], was analyzed. The results showed that the rules proposed herein solve most disambiguation problems of HH word pairs containing mid vowels in the corpus analyzed and can be applied to TTS and ASR applications. The data also revealed that a predominant trend of non-verb classes exists, and, for some word pairs, that value can reach 95% occurrence.
منابع مشابه
Acoustic phonetic properties of mid vowels in New Caledonian French
This paper investigates production of the mid vowels /e, ɛ, ø, œ, o, ɔ/ by four speakers of New Caledonian French (NCF). Formant and durational properties of these vowels are examined with respect to the type of syllable in which they occur. Results point to general adherence to the loi de position in NCF, such that the close-mid vowels occur in open syllables and the open-mid vowels occur in c...
متن کاملDesambiguação de Homógrafos-Heterófonos por Aprendizado de Máquina em Português Brasileiro (A Machine Learning Approach for Homographic Heterophone Disambiguation in Brazilian Portuguese)
To improve the quality of the speech produced by a text-to-speech system, it is important to obtain the maximum amount of information from the input text that may help in this task. In this context, the word sense disambiguation plays an important role and still be a central problem for natural language processing applications. This paper proposes to model the ambiguity of words as a supervised...
متن کاملThe Automatically Built up Homograph Dictionary a Component of a Dynamic Lexical System
Ambiguous word forms (often called "homonyms " or in written language "homographs ") are known as obstacles in many fields of computational linguistics, especially in automatic documentation, content analysis or mechanical translation. In this respect two problems must be distinguished: 1) the detection of homographic word fonus in the text, 2) their disambiguation by analysis procedures. This ...
متن کاملBuzzSaw at SemEval-2017 Task 7: Global vs. Local Context for Interpreting and Locating Homographic English Puns with Sense Embeddings
This paper describes our system participating in the SemEval-2017 Task 7, for the subtasks of homographic pun location and homographic pun interpretation. For pun interpretation, we use a knowledgebased Word Sense Disambiguation (WSD) method based on sense embeddings. Punbased jokes can be divided into two parts, each containing information about the two distinct senses of the pun. To exploit t...
متن کاملبررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کامل